Automatic Generation of Masked Microdata
نویسندگان
چکیده
Disclosure Control is the discipline concerned with the modification of data containing confidential information about individual entities, such as persons, households, businesses, etc. in order to prevent third parties working with these data from recognizing entities in the data and thereby disclosing information about these entities. In very broad terms, disclosure risk is the risk that a given form of disclosure will occur if a masked microdataset is released. Microdata represents a series of records, each record containing information on an individual unit. Several microdata disclosure control frameworks exist in literature but they focus on specific disclosure problems. Our proposed framework attempts to define the microdata disclosure control problem more generally. In this paper we describe the architecture of a software system called AMMG (Automatic Masked Microdata Generator). The system will generate masked microdata with low disclosure risk and information loss. A general framework for microdata disclosure control is proposed for this system. Also, existing disclosure risk measures are extended by this research. Variables in the microdata are classified at two-levels, one specified by the data owner and the other indicating the knowledge states of potential data intruders. These classifications form the basis for organizing disclosure risk scenarios. The disclosure risk measure presented in this paper is validated in our illustrations.
منابع مشابه
Post-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets
Previous work by these authors has been directed to measuring the performance of microdata masking methods in terms of information loss and disclosure risk. Based on the proposed metrics, we show here how to improve the performance of any particular masking method. In particular, post-masking optimization is discussed for preserving as much as possible the moments of first and second order (and...
متن کاملReverse Mapping to Preserve the Marginal Distributions of Attributes in Masked Microdata
In this paper we describe a new procedure that is capable of ensuring that the marginal distributions of attributes in microdata masked with a masking mechanism end up being the same as the marginal distributions of attributes in the original data. We illustrate the application of the new procedure using several commonly used masking mechanisms.
متن کاملGlobal Measures of Data Utility for Microdata Masked for Disclosure Limitation
When releasing microdata to the public, data disseminators typically alter the original data to protect the confidentiality of database subjects’ identities and sensitive attributes. However, such alteration negatively impacts the utility (quality) of the released data. In this paper, we present quantitative measures of data utility for masked microdata, with the aim of improving disseminators’...
متن کاملModeling and Quality of Masked Microdata
Statistical organizations collect data via survey forms and other methods. The microdata are valuable for modeling and analysis. To produce a public-use file, the organizations mask the data in a manner that may prevent re-identification of data associated with individual entities. The public-use microdata may allow one or two sets of analyses that approximately reproduce analyses that could be...
متن کاملControlled shuffling, statistical confidentiality and microdata utility: a successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-International database
IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented...
متن کامل